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ABSTRACT 

Insight into a number of interesting questions in cosmology can be obtained by 
studying the first crossing distributions of physicaUy motivated barriers by random 
walks with correlated steps: higher mass objects are associated with walks that take 
fewer steps before crossing the barrier. We show how to write the first crossing distribu- 
tion as a formal series, ordered by the minimum number of times a walk upcrosses the 
barrier. Since walks with many upcrossings are negligible if the walk has not taken too 
many steps, the leading order term in this series is the most relevant for understanding 
the massive objects of most interest in cosmology. For walks associated with Gaussian 
random fields, this first term only requires knowledge of the bivariate distribution of 
the walk height and slope, and provides an excellent approximation to the first crossing 
distribution for all barriers and smoothing filters of current interest. We show that this 
simplicity survives when extending the approach to the case of non-Gaussian random 
fields. Although this second part of our analysis is motivated by the possibility that the 
primordial fluctuation field is non-Gaussian, our results are general. In particular, they 
do not assume the non-Gaussianity is small, so they may be viewed as the solution to 
an excursion set analysis of the late-time, nonlinear fluctuation fleld rather than the 
initial one. 
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1 INTRODUCTION 

'The statistical distribution of gravitationally bound objects 
in the Universe is a powerful tool for constraining the amount 
of primordial non-Gaussianity, thus helping shed some light 
on the physics of the very early times. The dependence on 
mass of the abundance and spatial correlations of collapsed 
objects are useful and complementary tools for probing non- 
Gaussianity on different scales, in particular, scales that are 
smaller than those accessible wit h CMB observati ons. 
\ The excursion set approach (|Bond et al.lll99ll ) provides 
an analytical framework for linking the statistics of haloes to 
fluctuations in the primordial density field. In this approach, 
one studies the overdensity field 5 smoothed on the scale R, 



trajectory. Repeating this for every position in space gives 
an ensemble of trajectories, each starting from zero (homo- 
geneity demands (5 = for infinitely large smoothing scales). 
For each trajectory, one looks for the largest R (if any) for 
which the value of the smoothed density field lies above some 
threshold value (which may itself depend on R). An object 
of mass M ~ i?^ is then associated with that trajectory. 

If dn/AM denotes the comoving number density of 
haloes of mass M, then the mass fraction in such halos is 
(M/p) dn/dM, where p is the comoving background density. 
The excursion set approach assumes that this mass fraction 
equals the fraction of walks which cross the threshold (the 
"barrier") for the first time when the smoothing scale is R: 



5(x,i?) = / dylUfl(x-y)5(y), 



(1) 



where Wr is a filter that goes to zero for |x — y| ^ i?. At a 
given (randomly chosen) position in space the evolution of 
Sr as a function of (the inverse of) R resembles a random 
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/(i?) dR = (M/p) (dn/dM) dM. 



(2) 



Although r ecent work has focussed o n the shortcomings of 
this ansatz (|Paraniape fc Shethllioil ). the first crossing dis- 
tribution is nevertheless expected to provide substantial in- 
sight into the dependence of dn/dAI on cosmological pa- 
rameters. In any case, the question of how the first crossing 
distribution depends on the nature of the underlying fluctu- 
ation field is interesting in its own right. 

A crucial part of the problem is to avoid double count- 
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ing trajectories, i.e., to discard at lower scales all trajectories 
that have already crossed at larger scales (since they are al- 
ready associated with an object of larger mass - given by 
the largest scale on which the trajectory crossed the bar- 
rier). This is rather straightforward to implement numeri- 
cally, but hard to deal with analytically. Indeed, exact solu- 
tions are known only for the unrealistic case of walks with 
uncorrelated steps (for Gaussian fields, this corresponds to 
a smoothing filter that is a sharp step function in Fourier 
space) and only for a few specific barriers. Considerable ef- 
fort has been devoted to finding satisfactory analytical ap- 
proximations, or fitting formulae, for the generic case in 
which steps are correlated. 

The problem is potentially even harder for non-Gaussian 
initial conditions, since different Fourier modes of the field 
become coupled, and this introduces additional correlations 
between the steps, whatever the smoothing filter. More- 
over, the most sizeable non-Gaussian deviations are likely 
to be in the (massive object) tail of the distribution. In this 
regime, perturbative expansions around the Gaussian result 
are likely to blow up , so they must be handled with care 
(|D'Amico et al.ll2011] ). 

In this paper we provide a simple analytic approxima- 
tion scheme that works for a broad variety of barriers and 
filters, and can be implemented up to an arbitrary preci- 
sion level for any (Gaussian or non-Gaussian) distribution 
of the underlying matter density field. The general formal- 
ism is presented in Sections [5J and explicit calculations are 
carried out in Section [3l where we summarize our previous 
work on Gaussian fields and show how to extend it to non- 
Gaussian fields. Section U shows how to use our results as 
the basis of an excursion set study based on the late-time, 
nonlinear (rather than initial) fluctuation fleld. A final sec- 
tion summarizes our results. Appendix[^discusses how to go 
beyond the simplest approximation we present in the main 
text, and our use of the Edgeworth and related-expansions 
for approximating non-Gaussian distributions is summarized 
in Appendix [B] 



2 FIRST CROSSING DISTRIBUTION WITH 
CORRELATED STEPS 

In hierarchical models, the variance s = { 5^{R) ) of the den- 
sity field S when smoothed on scale R vanishes by definition 
for R = oo, and it grows monotonically for smaller R (note 
that {S) = for any R), according to 

where P{k) is the power spectrum of 5. Therefore, R and 
s can be used interchangeably, and it is in fact customary 
and convenient to study the walks as a function of s rather 
than R, as this has the advantage of hiding the dependence 
on the power spectrum and the smoothing filter. These only 
appear when the actual relation between s and R is needed. 

What we are after is the first crossing rate, i.e. the prob- 
ability that a walk 5 crosses for the first time the barrier 
b{s) at some scale s. In other words, we want to compute 
the probability that S{s) > b{s) at s but 5{si) < fe(si) for 
all si < s, knowing the probability distribution p{5;s) of 
the walk values at any s. In general, requiring 5{s) > b{s) 



is straightforward, whereas the additional constraint on the 
walk heights for all si < s is difficult to treat analytically. 

2.1 Height alone 

In one of the earl iest works on this subject, 
iPress fc SchechteJ (|l974l ') simply ignored this constraint, 
and estimated f{s) as 

d f'''-"^ 

fcc{s) = -—J d6p{S;s). (4) 

(The reason for the subscript CC will become clear shortly.) 
Strictly speaking. Press & Schechter ended up multiplying 
the right hand side of this expression by a factor of 2, and 
they only studied the special case in which b — Sc is in- 
dependent of s. (The extension to barriers which decrease 
monotonically as s increases is trivial; if the barrier increases 
sufficiently rapidly with s, then one must be a little more 
careful, as we discuss shortly.) That this does not impose 
any constraint on the walk values at la rger scales ( smalle r 
s) is a point which was hig hlighted bv iBond etUI (ligQll l. 
In fact, this formulation does not even distinguish between 
trajectories crossing the threshold upwards or downwards, a 
point to whic h we return shortly. 

Recently iParaniape. Lam, fc Shetbl l|2012l ) noted that 
there is an interesting and instructive limit in which equa- 
tion ^ is exact. Consider the set of smooth determinis- 
tic curves having S oc y^. Each of these curves represents 
what they called a completely correlated walk: one which is 
a monotonic function of s whose amplitude is set by a single 
number, the constant of proportionality, which one may take 
to be the height on scale s = 1. If the distribution of this 
constant is specified on one scale (say s = 1) then the dis- 
tribution of S on another scale, p{5;s), is simply related to 
p(S; 1), and, for this family of curves, equation @ is exact; 
hence the subscript CC. This limit is interesting because, re- 
gardless of the filter and the matter power spectrum, all cor- 
related walks tend to deterministic trajectories with S oc 
as s — >■ 0. Thus, in this (large mass) li mit, equation ([4|l i s 
exact, explaining the numerical results of lBond et all (| 19911 '). 

At larger values of s, the completely correlated limit 
is no longer so accurate. However, although small fluctua- 
tions around these trajectories appear, most walks still re- 
main monotonic functions of s. Therefore the contribution 
to p{5; s) from walks criss-crossing the barrier multiple times 
is still negligible, so the constraint (5(si) < b{si) for si < s 
is automatically satisfled. 

2.2 Upcrossing requires both height and slope 

As s gets larger, one must account for larger and 
larger fluctuations aw ay from the deterministic trajectories. 
iMusso fc ShethI (|2012l ') argued that the most efficient way of 
doing so, at fixed height 5 on scale s, is to consider fluctu- 
ations in the slope v = dS/ds {v for velocity). (For the en- 
semble of deterministic walks, the distribution of the slope v 
at fixed height 5 is a delta-function centered on 5 /2s.) Since 
one only wants to count walks that are crossing the barrier 
upwards, to the condition that 5 = b{s) one should add the 
requirement that v > db/ds (for a barrier of constant height, 
this is just V > 0). 
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Thus, if earlier upcrossings can be neglected, 
IMusso &: Shethl showed that f{s) can be computed 
from the joint probability p{5, v; s) that a walk reaches 5 at 
scale s with velocity v, as 



dv [v — b'{s)] p{b{s),v; 



(5) 



b'(s) 



Clearly, this formulation fails to discard those walks that 
were above threshold at some si < s, but with (5(s2) < b{s2) 
for si < S2 < s, i.e. walks with more than one upcrossing. 
However, at small s, the fraction of such walks is tiny, since 
the correlations between the steps make sharp turns very 
unlikely. 

Since = ( 5d(<5 — &)<5d(<5' — i') ), this approximation 
can also be written as 



ds 



-3(5 - b) 



nS'-b')), 



(6) 



which makes it clear that the condition to recover fee is 
that 5' > b' for most realisations. This is exactly the case for 
correlated steps in the large mass regime, since 5 = 6 implies 
that typically S' ~ b/s, and b' <^ b/s for small s (as long as 
the barrier is not receeding too fast from its initial value). 

In terms of the conditional probability p{v\b{s)) — 
p{b{s),v;s)/p{b{s);s), and omitting for ease of notation all 
explicit s dependences, the rate can also be written as 



/(s)~p(6)/ dv {v - b') p{v\b) . 



(7) 



This allows a very intuitive explanation in terms of particles 
in a box: p{b) plays the role of a number density at b (the 
number of particles in the one-dimensional volume element 
dS), while the integral is the mean of 5' — b' over all velocities 
larger than the barrier's increment given that S = b, that is 
the average escape velocity at b. The product of the two 
evaluated at the boundary is by definition the escape rate 
from the box. This makes it also easy to see the connection to 
deterministic walks for which p{v\b) — > (Sd(6/2s), and thus 
/(s) p{b)(b/2s - b') = -p(6/\/^)d(6/v^)/ds, which is 
indeed /cc(s). Of course, at larger s, when p{y\5) is broader, 
equation ((Tjl is a substantially more accurate approximation 
for /(s). 



2.3 Accounting for multiple upcrossings 

The approximation of equation ((Sjl accounts for all walks 
that cross the barrier upwards at s, including those that 
crossed the barrier previously, and thus it overestimates /(s). 
The error is expected to increase as s gets large, when such 
walks become increasingly common. Removing all the walks 
that crossed at si < s, i.e. walks with S[si) = b{s\) and 
«(si) > b'{si), and then integrating over si, would account 
correctly for the trajectories with just one crossing before the 
last one. So, if we stopped here, and assuming for simplicity 
a constant barrier, then we would get 

r+oc 

~ / dvvp{b, v) 
Jo 

PS poo poo 

— / dsi / dvivi / di;i;p(6i,«i,6, i;) + . . . , (8) 
Jo Jo Jo 



where p(6i, vi, b, v) is the quadrivariate distribution of (5(si), 
5'(si), (5(s) and 5'[s), and 6i = 6(si) = b. It is straightfor- 
ward to include a moving barrier, simply inserting b' and b'^ 
where needed (a la equation [S]). 

Trajectories crossing more than once would now be over- 
counted: for instance, a single walk crossing at si and S2 
would be removed twice by this procedure, and needs to be 
reintroduced. This would call for an additional correction, for 
walks crossing twice or more, containing p(tii, ui, 62, «2, fe, w) 
and integrals over si and S2, and so on. However, trajecto- 
ries with more zigzags will be even more suppressed, making 
an expansion in the number of crossings meaningful in the 
sense of perturbation theory at small s. 

Similarly to equation © for the leading order term, 
the first subleading correction can also be written in a more 
evocative way as 



-.(5-5) 



d{5' - b') 



:7-^{6i - bi] 
dsi 



m-b'i)}, (9) 



and the same pattern holds for higher order corrections. A 
rigorous derivation of this expansion from a path integral 
expression is carried out in Appendix H owever, for most 
cosmological applications, the analysis of IMusso fc Shethl 
is sufficiently accurate, so one does not even need 
the second term of equation ((8]). 



2.4 Gaussian or not? 

Before moving on, it is worth noting that the logic above 
holds in full generality, regardless of the shape of the distri- 
bution: the completely correlated non-Gaussian walks have a 
modified s dependence, but the first crossing distribution in 
this limit is still given by equation @, and this limit will still 
be a good approximation as s — > 0. At larger s, an expansion 
in the number of previous upcrossings is still sensible, where 
constraining the slope of the walk is the most natural and 
efficient way of ensuring it is upcrossing. So, equation 
should remain a good approximation until s values where 
walks which can have previously upcrossed more than once 
dominate, at which point the next terms in the program 
(outlined in Section [2. 3|l will become important. 

That said, there is one sense in which the non-Gaussian 
case is more complicated. For a Gaussian field, the proba- 
bility distribution of 5 on scale s only depends on the ratio 
This makes 



/cc(s) 



(10) 



where for ease of notation we have not written the scale 
dependence of 6(s) explicitly. If the barrier is constant, 
b = Sc, then Sc/y/s is usually called v and one finds 
s/(s) = vp{y)l2: the final factor of 1/2 is the reason Press 
& Schechter multiplied by 2 so many years ago. But notice 
that, in this limit, the first crossing distribution is very sim- 
ply related to the shape of the pdf. This would also apply 
to the non-Gaussian case, provided that the distribution of 
5 is indeed a function of &/y/s only. This is rarely the case, 
but as we will see it becomes a reasonable approximation at 
very large scales. 
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3 EXPLICIT CALCULATION 

In what follows, it will be convenient to use the rescaled 
stochastic quantities 



A=^, A' EE— and ^ 
Vs as 



A' 



2rs A' 



(11) 

where F, defined by (2rs)^ = I/{A'^) is a weak function of 
s fe.g. IMusso fc Shethllioil '). Notice that 



;a^) = (C') = i 



and 



;ac) = o; 



(12) 



i.e., A and ^ are independent. Similarly, we will work with 
bis) _ dB/d. 



B{s 



= -2rsB', (13) 



where B' = dB/ds. The sign of X is chosen so that a typical 
barrier has X > 0, since 6(s) for most problems of current 
interest does not vary much with s, and thus B' < 0. Since 
we are enforcing 5 — b, in these rescaled variables f(s) reads 



f{s),^-B' P di (^l-l^p(B,5;s 



Equivalently, equation ((6| becomes 



dA 



B{s) 



dep(A,e;s,si) 



(14) 



(15) 



where si must be set equal to s after taking the derivative. 
Since X ~ B, we see explicitly that we recover /cc(s) in the 
large mass X S> 1 limit. 



3.1 Summary of the Gaussian result 

If 5 is a Gaussian process, then the joint distribution of A 
and ^ is particularly simple because (A ^) = 0. When A = B, 
then 



Pg{B,0 =pg{B)pg{0 



2n 



(16) 



Inserting this in equation (|14|l shows that /(s) will be pro- 
portional to — B'pg(B), which, for a Gaussian process is just 
/cc(s) times a correction factor that is a function of X alone. 
Performing the integral yields 



fis) = -B' 



2tv 



l + erf(X/^/2) e'^'/^ 



27rX 



(17) 



This reduces to equation (jlOp - and therefore to /cc(s) - for 
X 2> 1 (the first term in the square brackets tends to unity 
whi le the second one is exp onentially suppressed). 

IMusso fc ShethI (|2012l ) showed that, for a wide variety 
of smoothing filters, power-spectra and barrier shapes, this 
expression was substantially more accurate than fee, and 
accurate down to scales on which a substantial fraction of 
the walks might have negative slopes. However, it cannot 
be accurate to arbitrarily small scales since the integral of 
/(s) over all s diverges. This is, of course, related to the fact 
that multiple upcrossings of the barrier become important 
as s increases. Appendix |X] describes how to account for 
these, but since we have not found a similarly simple analytic 
expression for the resulting /(s), and the range over which 
equation (|17|) is accurate covers most of the range which is 



of interest in cosmology, we will continue with this simpler 
case. 

Before moving on, we think the special case of Gaus- 
sian walks with correlated steps crossing a constant bar- 
rier deserves further comment. This is because, once one 
accounts for differences in notation and presentation, our 
equation (|17J) turn s out to be the same as equation (3.14a) of 



iBond et al. (|l991f ) for the first crossing distribution of a bar- 
rier of constant height. The origin of this agreement is that 
the expression within angle brackets in our equation © is 
the same as their equation (3.12a). (The same is true for our 
equation IA7I and their A3.) However, they appear to have 
made an error when comparing their equation (3.14a) with 
the Monte-Carlo solution of the constant barrier problem: 
their Figure 9 suggests that ignoring multipl e cross ings is a 
bad approximation, whereas I Musso fc ShethI l|2012l ) showed 
that it is in fact rather good. (The Monte-Carlo solutions 
themselves are in good agreement.) This led them, and the 
rest of the field since, to dismiss the approximation in which 
one ignores multiple upcrossings, and to focus instead on 
what appeared to be a more tractable problem (in which 
one ignores correlations between steps). In this respect, one 
might view our analysis of the constant barrier problem as 
having corrected an error which went unnoticed for more 
than twenty years. Of course, our analysis is more general, 
since we have shown how to apply it (successfully) to arbi- 
trary barriers. We now show how it can be generalized to 
arbitrary fluctuation fields. 

3.2 Generic non-Gaussian case 

The joint probability distribution of a generic stochastic pro- 
cess can always be written as an asymptotic expansion in 
Hermite polynomials around the Gaussian distribution ob- 
tained from the second moments. Since A and ^ are inde- 
pendent 



n,k 



n\k\ 



H„{B)H,{OPGiB)pG{0: 



(18) 

where we have used hats to distinguish the stochastic quan- 
tities from the continuous variables of the probability distri- 
bution, and Hnix) = exp(a;V2)(-d/d2;)" exp(-a;V2). This 
expression follows from the fact that the Hermite polynomi- 
als form an orthogonal basis with respect to the Gaussian 
weight. Rearranging the terms of the sum and factoring out 
p{B) shows that 



p{B,0=p{B)Y, 



ki 



d_ 



PGiO : 



where 



j:jH„(A)H,mH„iB)/r 
j:jH„{A))H„{B)/n\ 



(19) 



(20) 



and {fiO\B) is the expectation value of /(^) given that 
A = B, i.e. the one computed from p(^|i3). 

This expression must be inserted into equation (|14|l 
and integrated over ^. The k = term is just 
p{B)pg{£,)j and gives the same as equation ((T7}, with 
Pg{B) replaced by p{B). The following ones can be 
integrated by parts, and they pick up a factor of 
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{B'/X){~d/dX)''-^[l + erf(X/%/2)]/2, that for > 2 be- 
comes Hk~2{X)pG(X)/{2sr). For k — 1, one has 

{H4A)H,{i)) = (X/B')(iJn+i(A))7(n + 1); (21) 

furthermore, H„{B)pGiB) = J^dA ii"„+i(A)pG(A), so that 

(22) 



piB){H^{i)\B) = {X/B') / dA [dpiA)/ds] 

J B 

Putting all the contributions together yields 



-/ dAp(A) 



B(s) 



1 + erf (X/V2) 



B'p{B) 



-X^/2 



2nX 



1 + E 



(fc + 2)! 



Hk{X) 



(23) 



Equation ()23|) . the main result of this paper, is the non- 
Gaussian generalization of equation (|17p . The connection is 
most clearly seen by supposing that p{5; s) is a function of 
alone, in which case the derivative of the integral in 
the first line, which we could have written as /cc(s), be- 
comes ~B'p{B). Then equation (|23p - apart from the poly- 
nomial corrections in the square brackets of the second line, 
which we suppose small - becomes equation (I17p . except 
that here p[B) is the full non-Gaussian distribution. In gen- 
eral, of course p{5;s) will not be a function of only; 
the associated departure from self-similarity will introduce 
an additional term, and this is why equation (|23|) involves 
an explicit derivative with respect to s. 

In the large mass regime, the entire second line of equa- 
tion (|23[) is exponentially suppressed with respect to the first 
one, as it was in the Gaussian case, and the first crossing rate 
reduces to equation (Q. The polynomial corrections in the 
second line, which would blow up for X ^ 1, have a chance 
of becoming non- negligible only at larger s, when the ex- 
ponential suppression is no longer effective. In this regime, 
however, the perturbative treatment of these non-Gaussian 
corrections is fully under control. 

A remarkable feature of this result is that, although 
we started from the non-Gaussian bivariate distribution 
p{b,v\s), all the relevant non-Gaussian corrections (those 
that become non-perturbative on large scales) can be ex- 
pressed in terms of the univariate non-Gaussian distribution 
p{b; a). We have thus managed to disentangle the problem of 
the first crossing of the barrier from that of the evaluation 
of the probability of the walks, which we deal with next. 

3.3 Non-Gaussian case: large mass limit 

We have already argued that in the large mass limit the 
formal expression for the first crossing rate coincides with 
Eq. (U). To see this explicitly, differentiate with respect to s 
to get 



B' + Y. 



dB 



p{B-s), (24) 



where the infinite sum is the (integral of the) Kramers-Moyal 
expansion for dp/ds. The crucial point is therefore to com- 
pute p{B) from the moments of the distribution. 

The single point distribution p{B) can be written as 

^W(B;s) 



p{B-s) = 



(25) 



where the full expression of W{B\ s) in terms of the moments 
is given in Appendix|B]as a series of modified Hermite poly- 
nomials. This function, which corresponds to the logarithm 
of the Edgeworth expansion of p{B; s), has a straightforward 
interpretation in terms of connected Feynman diagrams con- 
structed out of the connected moments of the distribution, 
and better convergence properties that the Edgeworth ex- 
pansion itself. We also show in the Appendix that the large 
mass limit {B 3> 1) of W{B;s) is obtained by keeping 
only the highest order term of each polynomial, which in 
diagrammatic language corresponds to discarding diagrams 
with loops. This approximation is also what one would get 
doing the analysis in Fourier space and transforming back 
to real space by mean of a stationary phase approximation. 
In this regime, the infinite series of polynomials turns into a 
simpler infinite power series, whose first terms are 



(26) 

In the same spirit, we can approximate the n-th derivative 
as {d/dB)"p{B) ~ {dW/dB)"p{B), since higher derivatives 
of W{B; s) also correspond to loop diagrams and are sub- 
leading. 

The consistency of the truncation of W{B; s) is a del- 
icate subject. Clearly, as B becomes large one should keep 
adding more and more terms to Eq. H26|) . especially if the 
non-Gaussian moments are large, and a true B — > oo limit 
would necessarily require resumming the whole series. Fortu- 
nately, the range of values of interest for Cosmology (where 
B increases both with mass and redshift) is not so extreme, 
since pri mordial non-Gaussianit ies are fairly small. As dis- 
cussed bv lD'Amico et al.1 (|201ll 'l. for values of { A^ )c ~ .01 
and _B ~ 10 (corresponding to the most massive clusters 
of galaxies) the three terms listed in Eq. (|26p are 0(100), 
0(10) and 0(1) respectively, while neglected terms start 
with O(10~^). These values are obtained for primordial non- 
Gaussianity wit h /nl ~ 100, wh ich is now excluded by the 
Planck mission (lAde et al.ll2013l l. However, even larger val- 
ues of B can be attained at higher redshifts, or by the 
study of differen t objects like the reionisation pattern ofcos- 
mic structures (jjoudaki et al.l I2OIII : iD'Aloisio et"al] l2013l l. 
so that the discussion about how to truncate W{B), besides 
having its own theoretical interest, is not unnecessary. 

Truncating the Kramers-Moyal series in Eq. H24|l is on 
the other hand less dangerous. The reduced moments typi- 
cally tend to a constant on large scales, and the presence of 
their derivative in the coefficients of the series introduces an 
additional suppression. Furthermore, this series does not sit 
in an exponential, and errors in the truncation are poten- 
tially less harmful. Already for the n = 3 term of the series, 
keeping just the leading term of dW/dB gives a 0(1) result 
(or less, given the additional suppression due to the scale 
derivative) . Within the range of parameters outlined above, 
a fair approximation for the first crossing rate is thus 



3\/ \^ gW(S;s) 



(27) 



3! J V27r 
with W(B;s) given by Eq. (f26)) . 

In many cases (A^), (and more generally (A'')c) is 
only weakly scale dependent. If we can drop the (A^)J, 
term, then the expression above simplifies even further, re- 
ducing to equation (|10|) . This is just the Gaussian result, 
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-B' exp{-B'^/2)/V2^, times the non-Gaussian correction 
to p{B), exp((A^)cB^/3! + ...)■ This factorisation justifies 
the common practice of obtaining the full non-Gaussian mass 
function as the product of the fit from Gaussian simula- 
tions times an analytically predicted non-Gaussian correc- 
tion, known as the n on-Gaussian to Gaussian r atio. As al- 
ready pointed out bv lMusso fc Paraniapel (|2012l ). this ratio 
is simply the ratio of the pdf's. Our results confirm this in- 
tution, and at the same time highlight the conditions under 
which this result is true. 



3.4 Relation to previous work 

Most previous work on non-Gaussian excursion sets has con- 
sidered a barrier of constant height, for which —B' = B/2s. 
This is for instan c e the case of iMatarrese et al.l (|200ol ') 
and iLoVerde et al.l l| 20081 ) , who also explicitly assume that 
f{s) = 2 fcc(s) (see how ever the discussion on the fudge 
factor by Matarrese et al.h . This assumption, together with 
the choice of a Top-Hat filter like the one they use, does 
not appear justified from the point of view of excursion sets. 
However, their main concern was reproducing the results of 
N-body simulations, rather than excursion sets, and multi- 
plying by 2 was going in the correct direction. Also, this error 
disappears when considering the non-Gaussian to Gaussian 
ratio, as they did, with the aim of computing the correction 
that should multiply the result Gaussian simulations. 

The correspondence between our expression in the large 
mass limit and theirs is helped by noting that it is conven- 
tional to define aSa = {A^)c, so our (A^)^ = d{aS3)/ds. 
Since equation ((4|) with an extra factor of 2 is the full story, 
they are in effect missing the X dependent corrections which 
matter at smaller mass es. Our la rge mass limit differs slightly 
from the one of iLo Verde et al.l only because they keep the 
Edgeworth expansion, while we have been careful about how 
we wr ite the large mass limi t of equation H25|) . In this we fol- 
lowed [D^Amico^Flal] (|201lh who pointed out that perturba- 
tive non-Gaussian corrections blow up at small s, and need 
to be resummed in an exponential, whose argument corre- 
sponds to equ ation (|26 [ l in this regime . The same approach 



is followed bv lLoVerde fc SmithI (|201ll ). 

If (jSs is only weakly scale-dependent, so the (A'^)c 
term can be dropped, then the expression above simplifies 
even further: it is just the Gaussian r esult for f(s) times 
the n on- Gaussian correction to p{B). IMusso fc Paraniapel 
(|2012l ) used this to argue that the large scale limit of the 
non-Gaussian mass function from correlated random walks 
is always one half of the one obtained without filter-induced 
correlations, finding very good agreement with Monte-Carlo 
simulations. 

Moving barrier s and weakly non-Ga ussian fields were 
first considered by iLam fc ShethI (|2009l '). but only for a 
sharp-k filter. They found that, for moving barriers also, 
the large mass limit is just the Gaussian result times the 
non-Gaussian correction to the pdf, provided A{aS3)/As is 
small. Our more general analysis confirms this is true for 
other filters also, although the Gaussian result itself de- 
pends on the smoothing filter. Although writing /(s) this 
way is common practice, our analysis shows that it is not 
appropriate at lower masses, nor will it be accurate if gSz 



is scale-dependent, t he la tter being a point also made by 
Im usso fc Paraniap^ l|2012h . 

A self-consistent treatment of excursio n sets w i th cor - 
related steps was attempted by [Maggiore fc Riottd (|2010l '). 
who used a path integral formalism to compute the first 
crossing rate for barriers of constant height. Unfortunately, 
their choice to expand around the uncorrelated Gaussian 
solution makes the calculations very involved, and its reli- 
ability becomes problematic for large masses. Moreover, it 
only works for one specific choice of filter (Top-Hat) and 
matter power spectrum (ACDM), where their results are 
within 10% of the correct answer for Gaussian walks. The 
same is true for other w o rks fo llowing the same approach like 
ICorasaniti fc Achitouvl l|201ll ') (who were o nly able to con- 
sider linear barriers with small slope) and iD'Amico et al.l 
(|201ll ) (who did not consider moving barriers, but focussed 
on the safer non-Gaussian to Gaussian ratio, and assessed 
the range of validity of their results). 



4 HALO ABUNDANCES DIRECTLY FROM 
THE NONLINEAR FIELD 

Although the excursion set approach was formulated to pre- 
dict the abundance of nonlinear objects from the initial fluc- 
tuation fleld, we can use it to predict halo abundances from 
the late time fleld as well. This is because equation (|23p is 
valid even if p(b) is highly non-Gaussian. The problem is 
particularly simple because halos are often identified in the 
nonlinear field by finding a spherical or triaxial patch which 
is a fixed multiple of the backg round density, independent of 
halo mass (|DespaIi et al.ll201^V In effect, this means our ex- 
cursion set approach, applied to the nonlinear non-Gaussian 
field with a constant barrier, is an analytic model of the 
numerical halo finding algorithm. 

This has an important consequence for studies of the 
halo distribution which seek to approximate the smoothed 
halo field as a Taylor series in quantities derived from the 
underlying matter distribution. If the Taylor series is in the 
matter overdensity only, then the halos are said to be locally 
biased with respect to the mass. Our analysis shows that 
the bias must be nonlocal since the mass overdensity is not 
the only quantity which matters: at the very least, the first 
derivative of the matter field with respect to smoothing scale 
plays an important role in determining halo abundances, and 
this is expected to m ake the halo-mass bias fc-dependent 
(|Musso fc Shethll2012l '). 

That said, the critical nonlinear overdensity is of order 
100 X the background. This is substantially (at least 10 x) 
larger than the rms value of the field when smoothed on the 
typical halo scale, so it may be that the additional terms 
which come from constraining the slope are irrelevant. Since 
in this limit, our equation (|23|) reduces to equation Q, the 
halo mass function is very simply related to the probabil- 
ity distribution function of the nonlinearly evolved field. We 
are in the process of exploring this nonlinear excursion set 
approach further. 

Our analysis also allows one to address a related ques- 
tion, having to do with the self-consistency of the approach. 
Namely, suppose we estimate halo abundances not from the 
initial field (as is usually done), but from a weakly evolved 
one. How does the prediction compare with that based on 
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the initial field (the usual estimate)? If the approach is self- 
consistent, these two estimates should agree. 

Let p{M\Ve) denote the probability that a cell of vol- 
ume Ve placed randomly in the evolved distribution contains 
mass M. This distribution has mean AI = pVe, where p is 
the comoving background density, so the Eulerian density is 
1+5e = M/M . Local models of the evolution from the initial 
Lagrangian density (5l to the Eulerian one assume that 1 + Se 
is a deterministic invertible function of Jl- In perturbation 
theory, this means that 

r oo /'OO 

/ dmp{m\VE) {m/rh) = / dSp{S\s{M)) (28) 

(|Bernardeau et al.ll200 j iLam fc Shethll2008l 'l. (If halos were 
made of discrete particles, then this mass weighting is similar 
to only counting cells which are centered on particles of the 
distribution.) Halos correspond to large M/pVe for which 
5l(M, Ve) — >■ 5c{M). Since the right hand side here is the 
same as the right hand side of equation Q, the extra factor 
of M/M on the left hand side here shows that it is the mass- 
weighted Eulerian distribution which is related to the halo 
mass function. This demonstrates self-consistency at least at 
the higher masses where equation (|4]) is the appropriate limit 
of the two (i.e., the Eulerian and Lagrangian) predictions. 



fact that equation (|23p does not assume that the non- 
Gaussianity is weak - that it can be used to predict the 
abundance of nonlinear objects from the nonlinear rather 
than the initial fluctuation field. We argued that this means 
that halo bias must be nonlocal in principle, although local 
bias may be a good approximation in practice. We also ar- 
gued that our formulation demonstrates self-consistency of 
the approach, in the sense that applying it to the initial or 
the late-time field (i.e., the Lagrangian or Eulerian fields) 
yields the same estimate of halo abundances, at least at the 
higher masses where equation Q is the appropriate limit. It 
will be interesting to explore if this self-consistency survives 
in modified gravity models, where, because the linear theory 
growth factor becomes fe-dependent, even just the linearly 
evolved field is rather different from the initial one. 
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5 DISCUSSION 

We derived an intuitively simple formal expansion for the 
first crossing distribution of random walks with correlated 
steps, in which walks are ordered by the minimum number 
of times they cross the barrier from below (equation [S]) . The 
nature of the correlations between the steps is determined 
by the statistics of the field (i.e. Gaussian or non-Gaussian) 
when it is smoothed, which itself depend on the form of the 
smoothing filter. The leading order term of this expansion, 
equation (|23|l . is particularly simple. It only requires that 
when walks cross the barrier, they do so crossing upwards. 
Therefore, it requires knowledge of only the joint distribution 
of the walk height and its first derivative: in appropriately 
scaled units, these turn out to be independent of one another, 
making the analysis particularly simple. 

Previous work has shown that, for Gaussian initial con- 
ditions, this approximation (i.e. neglecting all the other 
terms associated with walks with multiple zig-zags) leads 
to equation (|17p . which works well for all filters of current 
interest, and for all barriers which are monotonic functions 
of smoothing scale. Our equation (|23p is a straightforward 
generalization of equation (|17|l to non-Gaussian fields: again, 
only the bivariate distribution of height and slope is required. 
In the large mass regime, our formula reduces to the even 
simpler form of equation Q, which depends on the distri- 
bution of the walk heights alone. In spite of the fact that 
perturbative non-Gaussian corrections individually blow up 
in this regime, this result is completely non-perturbative and 
exact, and it simply reflects the fact that those walks that 
reach the barrier in very few steps are very unlikely to cross 
it multiple times, because of the correlations. 

Equation (|23|l is useful for excursion set models which 
assume that the initial fluctuation field was non-Gaussian; 
indeed, this was the original motivation for this study. How- 
ever, the analysis worked out so easily - in particular, the 



APPENDIX A: MULTIPLE UPCROSSINGS 

In this appendix we discuss the derivation of equation ^ in 
a more formal way, and calculate the corrections one must 
account for when s becomes too large, or simply if one wants 
to quantify the errors introduced by the approximations we 
made. 

The result above can be derived in a more rigorous way 
within a path integral formulation of the excursion set the- 
ory, where one considers an ensemble of walks of A*' steps 
with infinitesimal increment in variance As = s/N. The 
first crossing rate is by definition the fraction of walks that 
crossed for the first time at the last step, over the width 
of the step. Calling p{5i, . . . , 5n) the joint probability of a 
walk, this is 

1 rbi /-bjv-l /■oo 

/(s) = -^/ dSi... dSN-i d5Np{5i,...,5N) (Al) 

As J -co J -oo Jbp/ 

where bi = b{si) is the value of the barrier at the scale Si = 
iAs corresponding to the i-th step. 

This expression can be written as the difference of two 
path integrals: a flrst one including all possible values of 
Si, ... , 5n-2, and a second one removing walks with at least 
one 5i > bi. The former is marginalized over Si, . . . ,5n-2, 
and thus is just the probability of having 5jv-i < fejv-i and 
Sn > bN ~ b, normalized to As. This is equal to 

-I /' + CXD r-h'\'{v — b' ) /\s 

— dv dSNp{SN,v), (A2) 

As Jy if, 

where i; = {Sm — (S]v-i)/As and b' = {bN — &jv-i)/As. For 
correlated steps, {{Sn — Sn-i)'^) oc As^: all correlators of i; 
and Sn tend to constants, and so does p{Sn, v). The As — >■ 
limit of this term thus returns the r.h.s. of equation ([5|l. 

The domain of the second path integral, that corrects 
the error introduced by the marginalizations, is for the first 
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N - 2 steps ( d<5. - dS.) - Rt^ UP 

to an overall minus sign, this is equal to 



E/ 



.d<5 



JV-2 , 



(A3) 



where the i-th term of the sum removes all the trajectories 
that had crossed for the first time at Si < sjv-i . One can then 
iterate the procedure, marginalizing the first i — 2 variables 
of each term, at the cost of introducing a new term with a 
sum up to i — 2 to correct the error, and so on. This yields an 
alternating series, in which the k-th term with k nested sums 
corrects for the trajectories with k crossings miscounted in 
the previous terms. 

If we repeat the same considerations for each of the 
k earlier crossings, introducing the velocities vi, . . . ,vt in 
addition to Vk+i = v, we obtain the continuum limit by 
replacing the k sums As with nested integrals over the 
crossing scales si < • • ■ < < s = s^+i. Finally, we divide 
by k\ and drop the constraint on the ordering of si, . . . ,Sk- 
This gives 



dv {v — b')p{b, v) 



dsi 



kl 



dsfc/('=)(si,...,Sfe,s), (A4) 



where [calling s 



dv 



Sk+i and b{s) = 
^ dvi Ylivj 



Ofe+lJ 



b',)p{{h,v^}). (A5) 



iMusso fc Shetiil (|2012D keep only the leading term of the full 
expression for /(s), while stopping the expansion at fc = 1 
yields equation ((8]). 

To see what these expressions imply, suppose that 
p{{bi,Vi}) = YliPi^'^''^i)' t'^is keeps the correlation between 
the walk height and its slope on each scale, but assumes 
that these are uncorrelated with the height and slope on 
any other scale. This makes the integrals over Sk separable, 
so the result is the product of k terms: 



/(s) = /up(s) 



(A6) 



where /up(< s) = Jq dsfup{s), and /up is the leading term 
in equation HA4[) (equation [S] in the main text). If we further 
assume that the barrier was constant, then each of the terms 
in the product is the same, making 

/(s) = fuAs) exp[-/up(< s)] . (A7) 

This final expres sion is the same as equation (A3) of 
iBond et al.l (|l99ll ). 

Since /up(< s) increases as s increases - and even ex- 
ceeds unity at large enough s - the result of including the 
extra terms is to damp /up(s) at large s. E.g., the expres- 
sion above indicates there will be an approximately 15% 
correction downwards at s ~ S^. This turns out to be 
slightly larger than the actual correction because in fact, 
p{{bi,Vi}) 7^ YliP{^-'''"i)- Although including the additional 
corrections which come from correlations between scales 
complicate the analysis, we believe the algebra above illus- 
trates nicely how the inclusion of multiple upcrossings will 
impact the result as s increases. 



APPENDIX B: THE NON-GAUSSIAN PDF 

Instead of using the Edgeworth expansion, the non-Gaussian 
probability distribution can be obtained applying a differen- 
tial operator to its Gaussian counterpart, as 



p{B;s) = e 



2tv 



(Bl) 



where V = E«=3[(A')cA!](-a/c»B)\ Expanding the expo- 
nential gives 

e"=i+E^(-5Br 



1 ^ 

+ 2tE 



{A')c(A^)c 



i-dB) 



i+j 



+ 



(B2) 



The probability ditribution can then be written in terms 
of the Hermite polynomials H„{B) = e^'''2(-9fl)"e-^'/^ 
as 



p{B;s) = 



/2tvs 



+ E 



1+E 

1=3 



H.+jiB) 



+ E 

i, J, fc— 3 



2! i! j\ 

(A'),(A^)e(A'') 

3! i\ j! k\ 



+j+k 



(B3) 



which is the Gram-Charlier expansion usually referred to in 
the literature. 

Although formally correct, this expression is not con- 
venient to deal with very large masses. In this regime, B 
can be so large that o ne might also have {A?)c B^ ^ 1 (see 
iD'Amico eral1l|201ll ^ for a detailed discussion), and in order 
to make reliable predictions one cannot truncate the series 
but needs to sum an infinite number of terms. In order to 
avoid doing this, it is convenient to resum the series above 
into an exponential and write 

p(B;s) : 



(B4) 



where the function W is 



W{B;s)^-^+f2^-^H,{B) 

i=3 



+ 



1 °° 

2! E 



{A')c(A^), 



h.j{B) 



^ oc 

+ 3! E 



(A'),{A^)e(A''). 

i\ j\ kl 



h,jk{B) + . 



(B5) 



and where we have defined the modified polynomials 

hij = -ffi+j — HiHj , (B6) 
Kjk = Hi+j+k - HiHjHk - {Hi hjk + perms.) 

= H,+j+k + 2H,HjHk - {H,H,+k + perms.) , (B7) 
hijki = Hi+j+k+i — HiHjHkHi 

- {Hi hjkt + perms.) - (/ijj hki + perms.) , (B8) 

and so on. At a first sight this is hardly going to help, since 
we are still dealing with an infinite series of terms that di- 
verge when B 3> 1. However, one can check that hij has 
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degree i + j — 2, hijk has degree i + j + fe — 4, and simi- 
larly for higher order ones, so that W{B;s) is a better be- 
haved expansion when B is large. Moreover, thanks to the 
exponential representation, truncating the expansion at any 
order is guaranteed to return a positive definite probability 
distribution. 

This result has a nice interpretation in terms of Feyn- 
man diagrams. If one assigns a power of B to each external 
leg and uses (— l)"{A")c/n! as vertices and -1 as propagator, 
each Hermite polynomial in Eq. HB3|I represents the sum of 
all possible ways to connect the vertices listed in its coeffi- 
cient, with all possible combinations of external and inter- 
nal lines and the correct combinatorial factors. For instance, 
{A^)cH3{B) represents the one tree-level graph with three 
external legs (whence B^) and the three one- loop graphs 
with one external leg (whence —3B) containing just one cu- 
bic vertex. In this language, W becomes the generator of the 
connected graphs; these are obtained removing from each 
Hijk... all the disconnected pieces, that is the products of 
two or more lower order connected terms. 

In the large-B limit, it is consistent to approximate 
this expansion keeping the leading term of each polynomial 
(which is equivalent to neglecting loop diagrams order by 
order). However, the smaller s gets, the higher is the order 
at which one can safely truncate the expansion. Up to 4th 
order one recovers 



W(B;s) ~ - 



B^ 



^B^ 



4! 



(B9) 



which is enough to describe the mas s function over the ran ge 
of scales of interest, as discussed bv lD'Amico et all (|201ll ). 

Here, if the combinations of connected moments which 
appear in the expansion above were functions of B only, then 
the resulting pdf would be self-similar in the sense used in 
the previous sections. That fact that they are not, in general, 
functions of the scaling variable B, means that the first term 
in equation (|23p will result in an additional contribution to 
/(s), which must be added to equation (|17|1 . 
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